auto_explore Machine learning practitioners need first to identify signal in their datasets before building models.The primary goal of auto-explore is to to establish a codebase that reduces the effort to produce a reasonable first-pass exploratory data analysis for a variety of dataset types.
This Python library is a first attempt at automating the process of exploratory data analysis – at least as far as computation and visualization is concerned.
Critical thinking is not included.
This work necessarily relied upon many excellent open source Python libraries. Some of the inspirations and instrumental tools have been:
pandas-profiling: This work was ultimately not used in this library due to neglect of the project.featexp: A great tool for visualizing univariate analyses of a target; useful pre-ML. This library was integrated into this one for experimentation on extending the work, and is used in producing univariate plots and feature selections.matplotlib: The bare-bones go-to viz package for Python. This package was relied upon heavily to produce this library.seaborn: Abstracts away from matplotlib by performing statistical analysis alongside plot generation. This package is also relied upon heavily.pandas: The go-to data wrangling tool for Python. pandas’ pd.DataFrame object as well as many of the tseries capabilities are leveraged.sklearn: An indispensable machine learning library that every data scientist should be proficient in using. All the functionality desired from this package has not yet been integrated into auto-explore, but there are plans to leverage this package more in the future.lmplot
plot_tseries_over_group_with_histograms
scatterplotmatrix
target_distribution_over_binary_groups
cluster_and_plot_pca1
cluster_and_plot_pca2